Technical Q&A QA1235
Converting to Precomposed Unicode


Q: Unicode ï∂éöóÒÇçáê¨çœÇ›ÇÃï∂éöÇÃå`éÆÇ…ïœä∑Ç∑ÇÈï˚ñ@ÇÕdžÇËÇ‹Ç∑Ç©ÅH

A: Mac OS X 10.2 Ç≈ì±ì¸Ç≥ÇÍÇΩ API ÇégópǵǃÅAï∂éöóÒÇçáê¨çœÇ›Çà Unicode Ç…ïœä∑Ç≈Ç´Ç‹Ç∑ÅBà»â∫Ç…ÅAçáê¨çœÇ›Çà Unicode ÇΔï™âÇ≥ÇÍÇΩ Unicode ÇÃà·Ç¢ÅAÇ»Ç∫çáê¨çœÇ›Çà Unicode Ç…ïœä∑Ç∑ÇÈïKóvǙdžÇÈÇÃÇ©ÅAÇ®ÇÊÇ—ïœä∑ï˚ñ@DžǬǢǃê‡ñæÇµÇ‹Ç∑ÅB

çáê¨çœÇ›ÇÃï∂éöÇΔï™âÇ≥ÇÍÇΩï∂éö

ì¡íËÇà Unicode ï∂éöÇÕÅAï°êîÇÃï˚ñ@Ç≈ÉGÉìÉRÅ[ÉhÇ≈Ç´Ç‹Ç∑ÅBÇΩÇΔǶnjÅAÁÅiA ÉAÉLÉÖÅ[ÉgÅjÇÕÅAU+00C1 (LATIN CAPITAL LETTER A WITH ACUTE) ÇÃÇÊǧǻçáê¨çœÇ›ÇÃï∂éöÅAÇ‹ÇΩÇÕ U+0041 U+0301ÅiLATIN CAPITAL LETTER A Ç…ë±Ç≠ COMBINING ACUTE ACCENTÅjÇÃÇÊǧǻï™âÇ≥ÇÍÇΩï∂éöÇÃÇ¢Ç∏ÇÍÇ©ÇÉGÉìÉRÅ[ÉhÇ≈Ç´Ç‹Ç∑ÅBWindows ä¬ã´Ç≈ÇÕçáê¨çœÇ›ÇÃï∂éöÇégópÇ∑ÇÈǟǧǙàÍî ìIÇ≈Ç∑Ç™ÅAMac ä¬ã´Ç≈ÇÕï™âÇ≥ÇÍÇΩï∂éöÇÃǟǧǙàÍî ìIÇ≈Ç∑ÅB

Mac OS Ç≈çÏãΔÇǵǃǢÇÈÇΔÅAçáê¨çœÇ›Çà Unicode ÇΔï™âÇ≥ÇÍÇΩ Unicode ÇÃëgÇ›çáÇÌÇπÇégópǵǃǢÇÈDZÇΔÇ…ãCÇ√Ç´Ç‹Ç∑ÅBÇΩÇΔǶnjÅAHFS Plus ÇÕÅAÉtÉ@ÉCÉãñºÇÇ∑Ç◊ǃÅAï™âÇ≥ÇÍÇΩ Unicode Ç…ïœä∑ǵNjÇ∑ÅBàÍï˚ÅAMacintosh ÇÃÉLÅ[É{Å[ÉhÇÕÅAàÍî Ç…çáê¨çœÇ›Çà Unicode Çê∂ê¨ÇµÇ‹Ç∑ÅBDZÇÍÇÕÅAÉVÉXÉeÉÄÇ™íÒãüÇ∑ÇÈ API ÇégǡǃÉeÉLÉXÉgÇèàóùǵǃǢÇÈå¿ÇËÅAñ‚ëËÇ≈ÇÕdžÇËÇ‹ÇπÇÒÅBÉAÉbÉvÉãÇà API ÇÕÅAçáê¨çœÇ›Çà Unicode ÇΔï™âÇ≥ÇÍÇΩ Unicode ÇÃóºï˚Çê≥èÌÇ…èàóùǵNjÇ∑ÅB

ǵǩǵÅAMacintosh à»äOÇÃÉvÉâÉbÉgÉtÉHÅ[ÉÄÇΔÇ‚ÇËéÊÇËÇ∑ÇÈèÍçáÇÕÅAçáê¨çœÇ›Çà Unicode Ç…ïœä∑Ç∑ÇÈïKóvǙdžÇÈǩLJǵÇÍÇ‹ÇπÇÒÅBÇΩÇΔǶnjÅAéüÇÃÇÊǧǻèÍçáÇÕÇ∑Ç◊ǃÅAçáê¨çœÇ›Çà Unicode Ç…ïœä∑Ç∑ÇÈê≥ìñÇ»óùóRÇ…Ç»ÇËÇ‹Ç∑ÅB

  • çáê¨çœÇ›Çà Unicode ÇégópÇ∑ÇÈÇÊǧDžíËã`Ç≥ÇÍÇΩÉlÉbÉgÉèÅ[ÉNÉvÉçÉgÉRÉãé¿ëïÇ∑ÇÈèÍçáÅB
  • çáê¨çœÇ›Çà Unicode ÇégópÇ∑ÇÈÇÊǧDžíËÇflÇÁÇÍÇΩÉNÉçÉXÉvÉâÉbÉgÉtÉHÅ[ÉÄÉtÉ@ÉCÉãÅiÇ‹ÇΩÇÕÉ{ÉäÉÖÅ[ÉÄÅjÇçÏê¨Ç∑ÇÈèÍçáÅB
  • çáê¨çœÇ›Çà Unicode Çä˙ë“ǵǃǢÇÈëÂó ÇÃÉNÉçÉXÉvÉâÉbÉgÉtÉHÅ[ÉÄÉRÅ[ÉhÇÉAÉvÉäÉPÅ[ÉVÉáÉìÇ…ëgÇ›çûÇfièÍçáÅB

èdóvÅF
ÉeÉLÉXÉgèàóùÇä»íPÇ…Ç∑ÇÈñ⁄ìIÇ≈çáê¨çœÇ›Çà Unicode Ç…ïœä∑Ç∑ÇÈÇÃÇÕîÇØÇƒÇ≠ÇæÇ≥Ç¢ÅBçáê¨çœÇ›Çà Unicode ï∂éöÇÕÅAàÀëRÇΔǵǃçáê¨ï∂éöÇä‹ÇÒÇ≈Ç¢ÇÈDZÇΔǙdžÇËÇ‹Ç∑ÅBÇΩÇΔǶnjÅAU+0065 U+030AÅiLATIN SMALL LETTER E Ç…ë±Ç≠ COMBINING RING ABOVEÅjÇ…ëäìñÇ∑ÇÈçáê¨çœÇ›ÇÃï∂éöÇÕǻǢÇΩÇflÅAçáê¨çœÇ›ÇÃï∂éöÇ÷ïœä∑Ç∑ÇÈÇÃÇÕNjǡÇΩÇ≠à”ñ°Ç™Ç†ÇËÇ‹ÇπÇÒÅB

Unicode ÉRÉìÉ\Å[ÉVÉAÉÄÇà Web ÉTÉCÉg Ç… Unicode Ç…ä÷Ç∑ÇÈÇ≥NjǥNjǻèÓïÒǙdžÇËÇ‹Ç∑ÅBì¡Ç…ãªñ°ê[Ç¢ÇÃÇÕÅAUnicode Standard Annex #15 Çà Unicode Normalization Forms Ç≈Ç∑ÅBDZÇà Q&A Ç≈égópǵǃǢÇÈÇÊǧDžÅAçáê¨ÇΔï™âÇΔǢǧópåÍÇÕÅAǪÇÍǺÇÍ Unicode Normal Forms D (NFD) ÇΔ C (NFC) Ç…ëŒâûǵǃǢNjÇ∑ÅB

Mac OS X 10.2 Ç≈ÇÃçáê¨çœÇ›ÇÃï∂éöÇ÷ÇÃïœä∑

Mac OS X 10.2 Ç≈ÇÕÅAUnicode ï∂éöóÒÇçáê¨çœÇ›ÇÃï∂éöÇÃå`éÆÇ…ïœä∑Ç∑ÇÈ 2 ǬÇà API Ç™ì±ì¸Ç≥ÇÍǃǢNjÇ∑ÅBç≈LJä»íPÇ»ÇÃÇÕÅACFStringNormalize Ç≈Ç∑ÅBÉäÉXÉg 1 Ç…ÅADZÇÃä÷êîÇÃÉvÉçÉgÉ^ÉCÉvÇé¶ÇµÇ‹Ç∑ÅB<CoreFoundation/CFString.h>ÉwÉbÉ_ÉtÉ@ÉCÉãÇÃÉRÉÅÉìÉgÇì«ÇfiÇΔÅACFStringNormalize ÇÃè⁄ç◊ÇÇÊÇËê[Ç≠óùâÇ≈Ç´Ç‹Ç∑ÅB



typedef enum {
    kCFStringNormalizationFormD = 0,
    kCFStringNormalizationFormKD,
    kCFStringNormalizationFormC,
    kCFStringNormalizationFormKC
} CFStringNormalizationForm;

void CFStringNormalize(CFMutableStringRef theString,
                       CFStringNormalizationForm theForm);

ÉäÉXÉg 1. CFStringNormalize ÇÃÉvÉçÉgÉ^ÉCÉv



Ç≥ÇÁÇ…ÅAMac OS X 10.2 Çà Unicode Converter ÇégǡǃÅAUnicode ï∂éöóÒÇçáê¨å`éÆÇ…ïœä∑Ç≈Ç´Ç‹Ç∑ÅBÉäÉXÉg 2 ÇÃÉRÅ[ÉhDžDZÇÃï˚ñ@Çé¶ÇµÇ‹Ç∑Åiprecompose ÉpÉâÉÅÅ[É^Ç…ÇÕ true ÇìnÇ∑LJÇÃÇΔëzíËǵǃǢNjÇ∑ÅjÅB



static OSStatus ConvertUnicodeToCanonical(
            Boolean precomposed,
            const UniChar *inputBuf, ByteCount inputBufLen,
            UniChar *outputBuf, ByteCount outputBufSize,
            ByteCount *outputBufLen)
    /* Unicode Converter ÇÃãKäiÇ…è]ǡǃÅAí∑Ç≥ÇÕÇ∑Ç◊ǃÉoÉCÉgêî */
{
    OSStatus            err;
    OSStatus            junk;
    TextEncodingVariant variant;
    UnicodeToTextInfo   uni;
    UnicodeMapping      map;
    ByteCount           junkRead;

    assert(inputBuf     != NULL);
    assert(outputBuf    != NULL);
    assert(outputBufLen != NULL);

    if (precomposed) {
        variant = kUnicodeCanonicalCompVariant;
    } else {
        variant = kUnicodeCanonicalDecompVariant;
    }
    map.unicodeEncoding = CreateTextEncoding(kTextEncodingUnicodeDefault,
                                             kUnicodeNoSubset,
                                             kTextEncodingDefaultFormat);
    map.otherEncoding   = CreateTextEncoding(kTextEncodingUnicodeDefault,
                                             variant,
                                             kTextEncodingDefaultFormat);
    map.mappingVersion  = kUnicodeUseLatestMapping;

    uni = NULL;

    err = CreateUnicodeToTextInfo(&map, &uni);
    if (err == noErr) {
        err = ConvertFromUnicodeToText(uni, inputBufLen, inputBuf,
                                       kUnicodeDefaultDirectionMask,
                                       0, NULL, NULL, NULL,
                                       outputBufSize, &junkRead,
                                       outputBufLen, outputBuf);
    }

    if (uni != NULL) {
        junk = DisposeUnicodeToTextInfo(&uni);
        assert(junk == noErr);
    }

    return err;
}

ÉäÉXÉg 2. Unicode Converter Çégópǵǃçáê¨çœÇ›Çà Unicode ÇçÏê¨Ç∑ÇÈï˚ñ@


DZÇÃÉRÅ[ÉhǬǢǃÅAéüÇà 3 ì_Ç…íçà”ǵǃÇ≠ÇæÇ≥Ç¢ÅB

  • precompose ÉpÉâÉÅÅ[É^Ç… false ÇìnÇ∑ÇΔÅAï™âÇ≥ÇÍÇΩ Unicode ï∂éöÇçÏê¨Ç≈Ç´Ç‹Ç∑ÅB
  • DZÇÃÉRÅ[ÉhÇ≈ÇÕÅAConvertFromTextToUnicode Ç≈ÇÕÇ»Ç≠ÅAConvertFromUnicodeToText ÇégópǵNjÇ∑ÅBUnicode à»äOÇÃï∂éöÉRÅ[ÉhÇ©ÇÁíºê⁄çáê¨çœÇ›Çà Unicode Ç…ïœä∑Ç∑ÇÈDZÇΔÇÕÇ≈Ç´Ç‹ÇπÇÒÅB
  • DZÇÃÉRÅ[ÉhÇ≈ÇÕÅAText Encoding Converter (TEC) Ç≈ÇÕÇ»Ç≠ÅAí·ÉåÉxÉãÇà Unicode Converter ÇégópǵNjÇ∑ÅBTEC ÇÕÅAçáê¨çœÇ›Çà Unicode Ç÷ÇÃïœä∑ÇÉTÉ|Å[ÉgǵǃǢNjÇπÇÒÅB

ï∂éöÉRÅ[ÉhÇ™ Unicode à»äOÇÃîCà”ÇÃï∂éöóÒÇçáê¨çœÇ›Çà Unicode Ç…ïœä∑Ç∑ÇÈÇ…ÇÕÅAA) ǪÇÃï∂éöóÒÇÅiUnicode Converter Ç‹ÇΩÇÕ TEC ÇégópǵǃÅjUnicode Ç…ïœä∑ǵÅAéüÇ…ÅAB) ǪÇà Unicode ÇÅiè„ãLÇÃÉRÅ[ÉhÇégópǵǃÅjçáê¨çœÇ›Çà Unicode Ç…ïœä∑Ç∑ÇÈïKóvǙdžÇËÇ‹Ç∑ÅB

íçãLÅF
Unicode ï∂éöÇ…ïœä∑Ç∑ÇÈÇΔÇ´Ç…ÅATEC ÇÕÅAÉ\Å[ÉXÇÃï∂éöÉRÅ[ÉhÇÃçáê¨/ï™âÇÃì¡ê´Çï€éùǵNjÇ∑ÅBÇΩÇΔǶnjÅAMacRoman ÇÕï™âÇ≥ÇÍÇΩï∂éöÇÉTÉ|Å[ÉgǵǃǢǻǢÇΩÇflÅATEC ÇÕÉfÉtÉHÉãÉgÇ≈çáê¨çœÇ›Çà Unicode Çê∂ê¨ÇµÇ‹Ç∑ÅBDZÇÍÇ…ëŒÇµÇƒ GB 18030 ÇÕï™âÇ≥ÇÍÇΩï∂éöÇÉTÉ|Å[ÉgǵǃǮÇËÅATEC ÇÕDZÇÍÇï™âÇ≥ÇÍÇΩ Unicode ï∂éöÇΔǵǃï€éùǵNjÇ∑ÅBDZÇÃÇΩÇflÅAÉ\Å[ÉXÉeÉLÉXÉgÇÃï∂éöÉRÅ[ÉhÇÃì¡ê´Ç™ï™Ç©Ç¡ÇƒÇ¢ÇÈèÍçáÇÕÅADZÇÍÇégópǵǃè„ãLÇà B ÇÃéËèáÇâÒîÇ≈Ç´Ç‹Ç∑ÅB

à»ëOÇÃÉVÉXÉeÉÄÇ≈ÇÃçáê¨çœÇ›ÇÃï∂éöÇÃïœä∑

è„ãLÇÃâåàï˚ñ@ÇÕÅAMac OS ÇÃà»ëOÇÃÉoÅ[ÉWÉáÉìÇ…ÇÕìKópÇ≥ÇÍÇ‹ÇπÇÒÅBMac OS X 10.1.x à»ëOÇÃä¬ã´Ç≈çáê¨çœÇ›Çà Unicode Ç…ïœä∑Ç∑ÇÈïKóvǙdžÇÈèÍçáÇÕÅAìΔé©ÇÃÉRÅ[ÉhÇãLèqÇ∑ÇÈïKóvǙdžÇËÇ‹Ç∑ÅBInternational Components for UnicodeÅiUnicode ÇÃçëç€âªÇñ⁄ìIÇΔǵÇΩ IBM ÇÃÉIÅ[ÉvÉìÉ\Å[ÉXÉRÅ[ÉhÅjÇ≈íÒãüÇ≥ÇÍǃǢÇÈê≥ãKâªä÷êîÇÃégópÇåüì¢ÇµÇƒÇ≠ÇæÇ≥Ç¢ÅB


[2003 îN 2 åé 7 ì˙]